Previous Book Contents Book Index Next

Inside Macintosh: Programming With the Text Encoding Conversion Manager /
Appendix E - Conventions for Unicode Text in the Mac OS


File Requirements

Most documents created on a Mac OS-based system use a richer text model than pure Unicode, so the emphasis here is on easy interchange with other platforms. In particular, an application should be able to

File Types

The file type 'utxt' has been registered for UTF-16 plain text documents. The (optional) scrap type 'utxt' is also registered for UTF-16 Clipboard text.

Whether it is useful to register a file type or scrap type for UTF-8 text is currently under discussion. As do other documents and text that use WorldScript encodings, plain UTF-8 documents could use the file and scrap type 'TEXT'. UTF-8 is compatible with the assumptions that govern WorldScript encodings; these encodings are not specifically identified in 'TEXT' files and Clipboard contents.

File Content

A plain text Unicode document, in a file or on the Clipboard, can contain any valid character from Unicode 2.0 or later. In particular, it can contain control characters in the range U+0000 through U+001F and U+0080 through U+009F. It may also contain codes in the Corporate and Private Use Zones although these may not interchange properly.

The byte-order mark U+FEFF may be present at the beginning of the content. If it is absent in UTF-16 content, big-endian order is assumed.

Creating Content

When creating file content, write line and paragraph separators using the special Unicode characters intended for this purpose--U+2028 and U+2029--instead of using some combination of CR and LF. This makes the content more portable; when the content is read on a particular platform, these Unicode separators can be converted to the separators customary for that platform.

Reading Content

When reading file content, accept and treat the Unicode line and paragraph separators as such. In addition, also treat any of the following as paragraph separators: LF, CR, CRLF.

When converting content to Mac OS encodings, set the kUnicodeLooseMappingsBit control flag. (You may use other control bits in addition to this one).


Subtopics
E - File Types
E - File Content
E - Creating Content
E - Reading Content

Previous Book Contents Book Index Next

© Apple Computer, Inc.
13 NOV 1997